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Functional Memory and Its 
Microprogramming Implications 



PETER L. GARDNER 



Abstract — Functional memory (FM) is a general -purpose systems 
technology and has been proposed as a solution to the problems of 
large-scale integration. It is based on an associative array, composed 
of writable storage cells capable of holding three states; 0, 1, and 
DON'T care. The functional memory module can be used either as a 
local store, control store, associative store, or logic block. In its use as 
a logic block, logic is performed by associative table lookup, using the 
DON'T CARE state to give significant compression of tables over con- 
ventional two-state arrays (typically n to n 2 words for functional 
memory instead of 2" words for conventional two-state arrays). The 
basic properties of functional memory are described in [1]. 

This paper is in two parts. The first part retraces some of the central 
material in [!}, examines the reasons for table compression, and ex- 
tends the techniques of table design. Following from a discussion of 
static tables, the treatment of time-dependent tables leads naturally 
to microprogramming. The second part identifies, two problems with 
microprogramming and suggests ways to overcome them. The prob- 
lems are branching delay and field combination. The branching delay 
problem is the delay inherent in using a data flow condition to modify 
the address of the next control word to be fetched. The solution is 
seen in the classification of data flow conditions into two classes: 
those that give only minor changes in direction (e.g., the kind of 
differences that exist between add, subtract, end compare), and 
those that give gross changes in direction (e.g., differences between, 
say, ADD and EDIT); and in keeping the minor conditions In the data 
flow and allowing only the gross conditions to return to the control 
store. Conceptually, whereas the gross changes modify the address of 
the control word to be fetched, the minor changes modify the con- 
tents of the control word that has already been fetched. 

The field combination problem occurs when the processing of two 
or more fully or semi-independent (micro) instruction streams is con- 
trolled by a single series of control words, each of which controls the 
whole data flow for a cycle. The problem is that the number of control 
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words necessary is related to the product of the independent in- 
stances, and may therefore be very large. 

The solution Is seen in the provision of multiple control streams 
with linking only where necessary. The number of control words 
necessary would then be related to the sum of the independent 
instances, and should therefore be smaller. 

In discussing these problems, functional memory is not seen as the 
solution, per se, but as clarifying the problems and making the solu- 
tion easier to find. The flexibility of functional memory is also seen 
as a stimulus to the further development of microprogramming. 

index Terms — Arrey logic, associative memory, DON'T CARE state, 
functional memory, LSI, microprogramming, modular memory, table 
lookup. 



Part I: Functional Memory and Logic 
A . Why A rray Lay ic ? 

THE problems of large-scale integration (LSI) have 
been stated many times, e.g., [1 ), [2], and need not be 
repeated here. Suffice it to say that, because of these 
problems, the following properties are desirable for a logic 
module made in LSI : 

1) high circuit density 

2) regularity of interconnections between circuits 

3) high circuit-to-edge connector ratio 

4) loadable, i.e., have easily changeable personality; this 
is to ensure high usage of module type 

5) easily testable. 

The circuit organization that fits these requirements best 
is a storage array. Functional memory attempts to solve the 
problems of LSI by using a storage array to perform logic 
economically. 
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B. Conventional Array Logic 

Traditional methods of performing logic on two operands 
by table lookup have been acceptable for narrow operand 
inputs, but have been prohibitively expensive for wide 
operand inputs because the number of words required has 
been related to 2 2n , where n is the operand width in bits (for 
example, related to 65 536 words for byte-wide operands). 
These methods have used either two-state directly addressed 
table lookup or two-state associative (content addressable) 
table lookup. 

Assume that the Boolean OR function of bit pairs is to be 
performed on two two-bit operands, as represented in Fig. 1 . 
The two-state directly addressed memory would require 16 
words of two bits, as shown in Fig. 2. The two-state associ- 
ative memory would require 1 5 words of six bits, as in Fig. 
3, two bits for read output, as before, plus four bits for 
search input in place of address decoder. 

Note that, because the information stored in the table is 
not dependent on position within the memory, the search 
input 0000 can be omitted, since its read output 00 is no 
different from the absence of output (unless checking is 
performed by a test on one and only one word selected). 

C. Functional Memory Array Logic 

In functional memory, where each cell position is capable 
of holding a binary 1 (matches a 1 input), a binary 0 
(matches a 0 input) and a don't care (matches a 0 input or 
a 1 input), the same table compresses to four words, as in 
Fig. 4. The don't care state is represented by X. 

Note that in functional memory, multiple selection of 
words is possible. The associated read outputs are ORed to 
form the result. 

For the or of bit pairs of two operands the number of 
words required is the following. 



Operand 
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Two-State 


Functional 
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Associative 


Memory 


(bits) 


(words) 


(words) 
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The number of words required varies per function. For 


example, the and requirements are as follows. 
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It can be seen from these figures that the degree of table 
compression possible with functional memory can be very 
significant. Before looking at the effect of functional 
memory on other, more complex functions, let us establish 
what it is that gives functional memory this advantage. 

D. Why Functional Memory Tables Compress 

The characteristics of functional memory that allow the 
compression of logic tables are as follows. 
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Fig. 1 . Bit pair ORing of two two-bit operands. 



ADDRESS INPUT 




STORED 
TABLE 





0 0 




0 1 




0 1 




0 1 




1 .0 




I 1 
1 1 

1 0 

1 I 
1 1 


SELECTED 


1 1 




1 0 

1 1 
1 1 
1 1 



SAMPLE INPUT 



RESULT 



Fig. 2. Two-state directly addressed table lookup. 
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1) Independent subfields: Each result output bit is treated 
separately. Only those input bits which should contribute to 
a particular result output bit are allowed to contribute. All 
others are deliberately excluded. The result output bit with 
its relevant input bits are together regarded as a subfield 
and an independent table is built for each subfield. All sub- 
fields are processed simultaneously and the result output 
bits put together by concatenation. For instance, in Fig. 1, 
A 2i B 2i and R 2 can be regarded as a subfield and A u B u 
and R l as another. The table could be represented as in Fig. 
5. 

2) Combinations of interest: Within each subfield only 
those combinations of input bits that actually produce an 
output are included. Combinations producing a 0 output 
are excluded. 

3) Compression of combinations to find prime implicants: 
The combinations of input bits that produce an output with- 
in a subfield are examined to find the prime implicants. 
For instance, if from inputs A, B, C, and D combinations 
A ■ B ■ C • D. and A B - C • D both produce the same out- 
put, then only one entry need be made for the two combina- 
tions. A - B - C is entered and the D position is stored as a 
don't care. This is the obvious use for the don't care state 
of the cell. 

Not all of these three characteristics are exclusive to func- 
tional memory (or to any other three-state associative 
organization). The first, independent subfields, is possible 
with both two-state directly addressed memories and two- 
state associative memories, and it is this characteristic, out 
of the three, that probably gives the greatest table com- 
pression. 

Subfields could be processed independently by two-state 
directly addressed memory and two-state associative 
memory by providing a separate, smaller array for each sub- 
field and concatenating the result output bits. The unwanted 
subfields would therefore be don't care by omission. Thus 
the total number of words required for a function would 
then be related not to the product of the number of words 
required for all independent subfields (as at present), but 
to the sum (as for functional memory). 

However, there would still remain the problem of decid- 
ing on the number of small arrays to provide (i.e., how 
many independent subfields to allow for) and on their input 
width and number of words. This decision would vary ac- 
cording to function and would have to be made at manu- 
facturing time. With functional memory, however, because 
one don't care is provided for each cell position instead of 
one for each subfield, no such decisions have to be made, 
and it is this factor which makes functional memory attrac- • 
tive as a general -purpose logic block. 

With associative arrays, whether two- or three-state, some 
of this effect could be achieved by providing a don't care 
per column (i.e., one per each bit of width). This, of course, 
is masking. The problem would remain, however, of decid- 
ing at manufacturing time on the number of words over 
which each mask should have its effect, i.e., the number of 
words required for each subfield. It is interesting to note that 
the don't care state of the functional memory cell is in 
effect a search mask per bit position. 
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Fig. 5. Illustration of independent subfields. 



The second characteristic, the inclusion of only those 
..combinations of interest, is common to any associative 
memory, whether two- or three-state. 

The third characteristic, the compression of combinations 
to find prime implicants, is a function of the don't care 
state of the cell (as opposed to the don't care state of the 
subfield) and is not shared by the two-state memories. 
E. The Functional Memory Module 

Fig. 4 showed the basic operation in functional memory 
whereby operands to be processed are presented as search 
input to the search input field of an array holding bit pat- 
terns appropriate to the function to be performed. The bits 
read from the read output field are the result. 

In functional memory a register, called an I/O (input/ 
output) register, holds the search input during SEARCHing 
and the result during READing. 

Although the search input field and the read output 
field are shown as separated, there is no physical boundary 
between them inside the module. The search and read 
fields can be of any width (within the limits of the module) 
and may overlap in any way. This gives generality and 
greater flexibility in choice of field widths than a fixed par- 
tition would allow. However, this generality must be paid 
for in two ways (see Fig. 6). 

1) Two-phase cycle: The result of the search is set into 
a register of latches, called selectors. There is one selector 
per word in the array. The process of searching the search 
input field of the array from the I/O register and of setting 
the result into the selectors is called the select phase. 

The second phase is the read/write phase. During this 
phase, if READing, the selector contents are used to access the 
array a second time and the result is set into the I/O register. 

The module cycle is made up of a select phase followed 
followed by a read/write phase. 

2) Masking: The search input field and the read out-, 
put field are defined for each cycle by a pair of masks, the 
search mask and the. read/write mask, respectively. The 
mask pair required for a given cycle is chosen from a mask 
stack by a set of mask address bits presented to the module 
at the start of the cycle. The mask stack is resident in the 
module and is loaded, together with the array, at initial 
load. 

The selector register is defined as a shift register, the shift- 
ing being performed at the end of the select phase by the ex- 
ternal control next, which is provided to the module at the 
beginning of the cycle. There are two important uses for 

NEXT. 

a) Loading: During initial load, words to be loaded 
cannot be addressed by content. NExring is an alternative 
method of addressing. 
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Fig. 7. Functional memory system cycle. 
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(a) select phase (search operation), (b) read/write phase 
(read operation). 



b) Word width : Should the total width of SEARCH input 
plus independent read output exdeed the total word width 
available in the module, then the search input patterns and 
the read output patterns can be interleaved such that the 
read output field for each table "word" is placed "next" to 
its corresponding search input field. 

After a read operation the result of the current cycle 
either remains in the I/O register of the same module or is 
passed via the data bus to another module for accessing 
another table in the following cycle. The functional memory 
system is therefore in three parts, as in Fig. 7. 

The I/O register communicates with the data bus by 
means of a set of bidirectional data pins. There is one data 
pin per bit position in the I/O register. Although only one 
I/O register and one set of data pins is necessary for the cor- 
. rect working of the functional memory module, it has been 
found that a significant improvement in array usage can be 
achieved by adding a second I/O register and a second set 
of data pins. The choice is made at the start of the cycle, 
under external control, as to which I/O register is to provide 
the search input and which is to receive the read output. 
Pipelining through the array is thus possible. 

In addition to the two sets of data pins, the module also 
has mask address pins (for addressing the mask stack), con- 
trol pins (to control choice of I/O register for select phase, 
choice of I/O register for read/write phase, whether to 
search, next, read, or write, etc.), and pins to carry 
power and clock signals. 

The data pins, mask address pins, and control pins all 
have compatible voltage levels on the bus. An FM data flow 
is composed of one or more FM modules with their data 
pins, mask address pins, and control pins connected to- 



— Search Input M1 " 

Fig. 8. Complete logical table. 

gether as required. Control of the data flow can either be 
from a control store or from the data flow itself. The provi- 
sion of the control from the data flow itself is known as 
autosequencing and is a powerful technique in functional 
memory. 

Only those properties of functional memory necessary to 
understanding have been described above. Flinders et al. 
give a more detailed description of the module functions, 
together with a discussion of the checking, diagnosis, and 
recovery aspects. 

F. Table Design in Functional Memory 

Fig. 4 showed a functional memory table for performing 
a bit-pair or of two two-bit operands. When two or more 
tables exist simultaneously in the same module, or in the 
same group of modules common-wired (known as a 
"store"), they are distinguished from each other by means 
of a table name or "tag." This is a field of bits (0, 1, or 
don't carje), stored in the array together with the tables at 
initial load, which form part of the search input field and 
which is searched by a "key" during the select phase. Thus 
the relationship between the key in the current cycle and the 
tags stored beside the tables defines the function to be per- 
formed on the operand(s) in the current cycle. 

The tables in Fig. 8 (repeated from [1 ]) provide the four 
bit-pair products of two three-bit operands, A 2 A 2 A { and 
B^B 2 B U with their tags. 

The tables can be used individually (for example, key 0010 
will select table A n • B n and exclude all others) or in com- 
bination to select any of the 16 possible logical functions of 
two variables. The contents of the I/O register in Fig. 8 
show the key and operand input bits (together forming the 
search input) and the result output for an xor operation. 

For clarity, the figures illustrating tables hereafter use 
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blanks to represent those cells holding the X state. Only 
cells holding a 1 read out as 1 ; cells holding a 0 or an X read 
out as 0. 

In designing a table for a particular function, it is neces- 
sary to express each result bit (in true form) as the logical 
sum of product's of the input bits, all factors having been 
removed, and then map the equations into table form. 

For instance, if the four-bit operand A 4 A 2 A 2 A 1 is to be 
incremented by one in its least significant position (Aj) and 
the five-bit result R s R 4 R 2 R 2 Ri is to be output, we would 
need to express the function as 

0) 

R i = X 2 :A 1 + A 2 -A l (2) 
fl 3 = A z ■ A 2 • A l + A 3 • A 2 + A 3 • A i (3) 
R A = A A - /4 3 • A 2 - A x + A 4 - A 3 + A 4 - A 2 + A 4 - A t (4) 
R 5 =A 4 AyA 2 -A l . (5) 

This is then mapped into a table such as Fig. 9. Note that 
incrementing by means of this table requires only one cycle, 
since carry ripple is dealt with in the table. 

Note also that an increasing number of words need to be 
added to the table as each new bit of input width is added : 

1 + 2 + 3 + 4 + • ■ ■ n( + 1 for carry), 
i.e., a total of 

11 for a four-bit operand 
37 for an eight-bit operand 
r 137 for a 16-bit operand 
(n + 1) x n/2 -J- 1 for an /i-bit operand. 

Provided that it can be written as the logical sum of prod- 
ucts, any function can be mapped into a one-cycle func- 
tional memory table. However, some functions require a 
large number of words (such as 137 words for the 16-bit in- 
crement); for some other functions (such as binary add), 
the number of words required is so large that one is forced to 
design tables for two or more cycles, the intermediate re- 
sults from the first table being passed as input to the second 
table. 

It is reasonable, therefore, to search for additional 
module functions that can increase the logical power of the 



module while remaining generally useful, read right xor 
left (xor means exclusive or) has been found to be such 
a function. 

G, Functional Memory Ceil and READ RIGHT XOR LEFT 

Before illustrating the uses of read right xor left in the 
design of functional memory tables, a brief discussion about 
the FM cell is necessary. 

The functional memory cell is composed of two bistables, 
named "left bistable" and "right bistable," each connected 
to its own bit line ("left bit line" and "right bit line," re- 
: spectively), and both connected to a common word line. 
The cell states are represented as follows. 



Cell State Left Bistable Right Bistable 



0 1 0 

1 . 0 1 
X 0 0 
Y 1 1 



(The Y state is not essential to the functional memory con- 
cept, but can be useful, as in the use of read right xor 
left.) 

A column consists of a left bit line and its corresponding 
right bit line together with all the bistables connected to 
them. A word consists of a word line together with all the 
bistables connected to it. Columns and word lines, when 
put together as an orthogonal matrix, form the functional 
memory array (see Fig. 1 0). Each column in the array is con- 
nected to a corresponding bit position in the I/O register. 

During search the bit lines are driven according to the 
polarity of the corresponding I/O register bit and search 
mask bit for that cycle; thus we have the following. 





search Mask 






I/O Register Bit 


Bit 


Left Bit Line 


Right Bit Line 


0 


1 


lower 


raise 


1 


1 


raise 


lower 


any 


0 


lower 


lower 



During search each bistable gates out a mismatch signal 
onto its word line if and only if it contains a one and its bit 
line is raised. Mismatch signals gated onto a given word line 
are ORed and the inverse of the or is set into the correspond- 
ing selector to indicate "word selected." 

During read the selected word lines are driven and the 
bistables that are attached to them output onto their corre- 
sponding bit lines if and only if they contain a one and their 
word line is raised. The signals gated out onto a given bit 
line are ORed. 

With the tables shown so far and in [1 ], the information 
ORed onto the left bit line was ignored during read. The 
ORed information on the right bit line was set into the corre- 
sponding I/O register bit position (having been gated by 
the read/write mask). Hence only cell state =1 had any 
effect on the result of a read. (Cell state = Y read as = 1 , and 
was therefore not used.) 

With read right xor left the ORed information on the 
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Fig. 10. Functional memory cell and array. 
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Fig. 11. read right XOR left on single functional memory column. 

right bit line is xoRed with the ORed information on the 
corresponding left bit line before being gated by the mask 
bit and set into the corresponding I/O register bit position. 
Thus cell state = 0 can now have an effect on the read result. 
Also, cell state = Y, if read out onto a column, can now be 
used to inhibit the output of that column (see Fig. 1 1). 

H. The Module Function 

With read right only the result bits were expressed as 
the logical sum of products 

R k = (A x • B x • C x Nj) + (A 2 • B 2 C 2 ■ : • • • N 2 ) 

+ ~- + (A m -B m -C H iVJ (6) 

where each element K t could be true (=1), complement 
( = 0), or ignore ( = X). This equation was true whether the 
table was in one module or . spread over several with the 
outputs of each being dot-ORed on the data bus. In ab- 
breviated notation the total module "function" could be 
expressed as 

. + . + . + •-. + ■ + . (7) 
where • represents a product, i.e., one word's worth of 
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Fig. 12. Karnaugh maps for read right only. 
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Fig. 13. Case 3 with read right only. 

matching and selection, and + represents the ORing of the 
readout either onto the bit lines (right bit line only) or onto 
the bus. 

With read right xor left the total module function can 
be expressed as 

[{(• + • + • ■ ■)¥(• + • + ■ ■ )}"](• + • + • ■ -)] + [same] 
| | i + [same] 

12 5 +■■■ (8) 



T 
4 



where 1 represents a product, i.e., search and match; 
2 represents ORing of readout onto a given bit line ; 3 repre- 
sents ORed information on, say, right bit line; 4 represents 
ORed information on, say, left bit line; 5 represents right 
xor left, Y means exclusive or ; 6 represents READing out 
the Y state contained in one or more cells in a given column ; 
• ] means and not; and 7 represents ORing of I/O register 
bits on the data bus. 

The use of the Y state has been shown as separate in 6 
(above) in order to show its effect more clearly. The physical 
operation is such that the logical sum of products in bracket 
6 is actually included in brackets 3 and 4 at the same time. 

/. read RIGHT XOR LEFT Explained by Karnaugh Maps- 

The effects of read right xor left can be demonstrated 
on Karnaugh maps in Fig. 12. Case 1 is "complete." Case 2 
is "nearly complete." Case 3 is "nearly complete plus a bit 
more." In all cases read right only requires building up 
the function additively. For instance, Case 3 would map 
into a read right only table as the four words in Fig; 13. 

Whereas with read right only the designer is obliged to 
build up the table additively, selecting only those prime im- 
plicants that he wants, with read right xor left he has 
the option to build additively or to select more than is 
wanted and to remove the unwanted portions. For example, 
Case 1 remains unchanged since it is complete already. 
Cases 2 and 3, however, can be expressed as in Fig. 14. Both 
Case 2 and Case 3 would map into only two words each 
with read right xor left, as in Fig. 15, contrasting with 
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Fig. 15. Cases 2 and 3 with read right xor left. 
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Fig. 16. Odd parity, read right only. 
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Fig. 17. Odd parity, read right xor left. 

the three and four words, respectively, with read right 
only. Note that in Fig. 15, Case 3, the result values 1 and 0 
are interchangeable, provided that A outputs one value and 
B - C D outputs the other. 
- The most expensive function for the logical sum of prod- 
ucts method of table mapping and the function for which 
read right xor left is most obviously helpful is parity. 
This function can be represented as ail the black squares on 
a chess board, as in Fig. 16. With read right only the table 
designer must select individually every input combination 
which produces a certain parity (say, odd). No compression 
is possible and within the subfield the X state is useless. 
With read right xor left, however, odd parity could be 
represented, as in Fig. 17. The words required for a parity 
table are therefore the following where n/2 must be rounded 
up to the nearest integer. 



READ RIGHT ONLY READ RIGHT XOR I.I IT 

(bits) (words) (words) 



8 

128 



4 

16 



J. Tables Using HhA !) RKUfT XOR LEFV 

The foregoing was an attempt to show the effects of read 
right xor left in principle. Two examples should serve to 
show its effect in practice. 

/) Increment: Four-bit operand A 4 A 2 A 2 A i incremented 
by one: Equations (I )-(5) in Section F expressed each result 
bit as the logical sum of products, as required by read 
right only. With read right xor left the equations can 
become 

R t =A, (9) 

R 2 = a 2 V A x (10) 

*3 - A; ¥ (A 2 -A l ) (11) 

R 4 = A 4 Y (AyAj-A,) (12) 

R s = A 4 A Z A 2 A V (13) 

These equations can map into a 2n word table, compared 
with 1) xrt/2-H words required for read right only, 
For example, a 16-bit increment is now reduced from 137 to 
32 words. 

Although this reduction seems impressive, one can go 
• further. Using the fact that (avb)=(a v5), the equations 
for the increment result bits can now transform to 

*i (14) 

R 2 = A 2 Y A, (15) 

R 3 = A 3 *(A 2 + A X ) (16) 

R 4 = A 4 v (4 3 + A 2 + A x ) (17) 

i? 5 = l¥(l 4 + 4rl 2 + A x ). (18) 

The lable for increment now has the rather unexpected form 
given in Fig. 18. 

Final reduction figures for increment are the following. 

Two-Stale 

Directly Functional Memory Functional Memory- 
Operand Addressed read right only read right xor 
Width (bits) (words) (words) left (words) 



4 
8 
16 



16 

256 
65 536 



11 

37 
137 

1« + I)x« 2+1 



5 

9 
17. 
n+l 



2) One-cycle compare: Comparison of two operands, 
A 4 A 3 A 2 A ! and B 4 B 3 B 2 B ly for A >B can be expressed as 

A >B=A 4 • B 4 + A 3 • E 3 ■ (A 4 + B 4 ) + A 2 - B 2 • (A 4 + fi 4 ) 
• (Ai + Bj + A^ErlAt + BJ iAi + BJ ^ + B,). (19) 

With read right only, the expression must be multiplied 
out and 2"- i words are required for a one-cycle compare 
(where n is the width of each comparand). An additional 
2"- 1 words are required for A < £, clearly unacceptable for 
medium to wide comparands. 

With read right xor left the expression can be trans- 
formed to 

A>B — A 4 B 4 -\-A 3 B 3 ' \A 4 • B 4 )+A 2 ■ B 2 ■ ~\(A 4 • B 4 

+ A 3 >B 3 ) + A r B l 'l(A 4 B 4 + A 3 B 3 + A 2 B 2 ) (20) 
where • ] means and not. 
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Input 
A 4 *3 A 2 *1 



Output 



0 0 0 

0 0 0 

0 0 1 

0 1 

) 



Fig. 18. Increment, read right xor left. 
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Fig. 19. Compare, read right xor left. 
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Fig. 20. Compare. Reduced width. 



By allowing a separate output column for each major con- 
junct, and by dot-ORing those columns on the data bus, the 
2/1+1 word table in Fig. 19 will provide not only the re- 
quired A > B, but A < B and A = B as well. 

The tradeoff is thus a small (linear) number of columns 
against a large (2") number of words. Nevertheless, although 
the tradeoff is clearly worthwhile, word width is valuable. 
Therefore the following refinement should also be adopted. 

Using the fact that A k -E k can also be expressed as 
A k - B k - \A k • £ fc ), (20) can be reexpressed as 

A > B = (A 4 • B 4 + A 3 ■ S 3 ) ■ "|M 4 • B J + (A 2 • B 2 

+ A, • B x ) ' 1(^4 * *4 + ^3 ' *3 + ^2 * B 2 ) (21) 

and the table in Fig. 1 9 will reduce in width to that in Fig. 20. 

Part II : Functional Memory and Control 
A. Introduction 

The discussion so far has been about the logical aspects of 
functional memory — how tables can be designed to per- 
form required functions. Most of the functions have been 
performed in a single cycle. However, some functions, such 
as binary addition, may require two or more cycles for the 
final result to emerge. . 

This introduces the idea of time-dependent tables. The 
second table must wait until initiated by the output of the 
first table. Alternatively, the second table can be initiated 



after a certain interval of time (i.e., a certain number of 
cycles) relative to a reference point. In both cases the stim- 
ulus is fed either to the table as part of the search input or to 
one of the external control lines of the module. A functional 
memory data flow is a collection of modules holding single 
cycle or sequenced tables (together with locations used for 
data storage, status information, etc.), joined together on 
the data bus as required. The sequencing of the use of the 
tables is performed by microprogram. 

The microprogram of a functional memory data flow 
could be in a conventional control store with conventional 
sequencing and with the functional memory external con- 
trols and table keys each having their own control fields in 
the control word. This would make a reasonable system, the 
functional memory having replaced the logic of a conven- 
tional data flow. 

But what would be the effect if the microprogram itself 
were in functional memory? It could be used to provide a 
conventional control store organization, as in the previous 
paragraph. The sequencing could be explicit — some of the 
bits read out in the current cycle being used to address the 
immediate successor by content-addressing, or implicit- 
using NEXTing to address the successor by position. Two 
control words could be stored in each functional memory 
word and distinguished from each other by read right 
xor left through an appropriate filter of all 1 's or all 0's, or 
by using read modes right only and left only. 

However, using functional memory in this manner would 
be an expensive way of providing a conventional control 
store (the extra expense might be justified by other con- 
siderations, such as wanting a unified procedure for loading, 
checking, diagnosis, recovery, debug, etc.). Furthermore, it 
would also be to ignore the interesting implications that 
functional memory has for microprogramming. 

Only two of these implications are discussed here. These 
concern the branching delay problem and the field com- 
bination problem. In this discussion functional memory is 
seen not as the solution to the problems per se, but as clarify- 
ing the problems and making solutions easier to find. 

B. Branching Delay Problem 

The conventional procedure for a microprogram branch 
is for a condition or conditions from the data flow to be 
passed to a black box to modify in some way (it may be a 
transform or merely concatenation) the address produced 
at the end of the preceding cycle. The control for the current 
cycle cannot start until the modified address is formed and 
the required control word accessed. This means either that 
the data flow and control store cannot be run at full speed 
and must be interleaved to some extent [see Fig. 21(a)], or 
that leap-frog branching must be accepted [see Fig. 21(b)]. 
In both cases delay is involved when an immediate branch 
is required. 

This section does not attempt to eliminate branches that 
cause this delay, but to show that their inclusion in the 
microprogram is often unnecessary and can usually be 
avoided. In computer processing, decision making and 
branching are not synonymous. Consider three cases. 

Case 1: When, in a conventional data flow, register P 
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Fig. 21. (a) Control store and data flow interleaved, 
(b) Leap-frog branching. 

and Q are controlled to provide their contents to an ALU 
(arithmetic and logic unit) to be added together, no branch- 
ing need take place while the data pass through the ALU. 
Nevertheless, decisions are being made in the logic of the 
ALU; in other words, there is a difference between the way 
the logic behaves when, say, P=3 and Q = 5 and when 
F = 3 and Q = 6. In this case it is most unlikely that the 
microprogram would branch. 

Case 2: A further input to the ALU is the carry in. Its 
value affects the decision making within the ALU in the 
same way as the values of P and Q. If the ALU is narrower 
than the data widths then the data must be sectioned and the 
carry out of each section must be entered as carry in to the 
successor section. In this case it is quite possible that the 
decision to enter the carry or not might be made by a micro- 
program test and branch. 

Case 3 : Depending on the programmer instruction being 
performed, the control to the ALU may require to be add 
(as for add) or subtract (as for subtract or compare) or 
some other function (e.g., as for logical instructions). In 
this case it is very likely that the decision will be made by a 
microprogram test and branch. 

As presented, these three decisions differ in their "level" 
within the machine control, and hence in the degree of 
probability that, in a conventional data flow, they will be 
treated by microprogram test and branch. 

In Case 1 the microprogram might read "take the values 
found in registers P and Q and add them together." Param- 
eters are being used, in this case P and Q. 

A (micro)program might be described as a sequence of 
steps to be taken by the unit being controlled, using param- 
eters the values of which are not known when the (micro) 
program is written. It is the use of parameters instead of 
exact values that allows for variety and decision making 
within the same basic "shape." 

There is nothing fundamentally different between Cases 
1, 2, and 3. All three could use parameters and avoid 
branching. An example might read "If condition C exists, 
do function specified by value in F (register or data bus) to 
contents of register P and Q." 

Whereas branching involves taking conditions from the 
data flow and returning them to the control store to modify 



the address of the succeeding control word, the use of Model 40 

parameters within a jingle shape could be regarded as The coi 

keeping the conditions in the data flow to modify the con- logically i 

tents of the succeeding word. operand - 

Thus, if parameters are used, the following is true. where the 

1 ) The succeeding control word can be fetched before the unless ea 
value of the condition is known. This saves time. own ind< 

2) The same control word will do for two or more dif- single-ad 
ferent paths within the same basic shape. This saves control provide c 
bits. of combi 

The system designer should pitch the level of his micro- Consk 

program language to suit the system architecture according shown ir 

to the following principle. compose 

Principle 1 : Data flow conditions are of two types : those them wh 

that give only minor changes in direction (e.g., the differ- of the fi> 

ences that exist between add, subtract, and compare) and 1) Ad 

those that give gross changes in direction (e.g., differences ments a< 

between, say, add and edit). Minor conditions should be 2) Wi 
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lowed to return to the control store. 3) Lc 

Although any data flow organization can apply this register* 

principle, functional memory is particularly well suited to 4) M 
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ventional microprogram data flow. Now much of the con- three fu 

trol can also be integrated into the data flow. Decisions are A i 
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which the decisions depend and the place where the action L 
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In this respect functional memory forms an interesting all thre 

bridge between conventional microprogram-controlled and field d< 
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C. Field Combination Problem 

Although not fundamentally different from each other, 
program and microprogram tend to differ in one important 
respect. In general, program is concerned with only one 

facility or element of the system at a time; microprogram It is HI 

can be concerned with several at the same time. in di ff 

Although in microprogram all the facilities or elements of 
the data flow may be contributing to the overall goal (for 
example, the particular programmer instruction being 
microprogrammed), the elements are often working in 
parallel on portions of the microprogram flow that can be 

considered independent of each other for the time being Sin 

(say, for the following n cycles). the tc 

The truth of this is most obvious in a pipelined organiza- relate 

tion where there may be, say, three different programmer in- j n e a« 

structions in different stages of decomposition in the data if, 

flow at any one time. It is equally true, however, in a normal ara te 

single-instruction stream, for instance with a System/360 trol s 

RX instruction, where an operand from mainstore is pro- WO ul 

cessed with an operand from a programmer register. From en t v 

the architecture point of view, there is no constraint on the Le 

order in which the operands are fetched: operand 1 could p t 

be fetched first, operand 2 could be fetched first, or both tify 

operands could be fetched at the same time. System/360 arc h 
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Gardner '. functional memory and its microprogramming implications 

Model 40, for example, fetches both at the same time. 

The control sequence for fetching operand 1 is therefore 
logically independent of the control sequencing for fetching 
operand 2, and this independence continues until the point 
where they are brought together for processing. However, 
unless each of the independent control sequences has its 
own independent addressing structure, the conventional 
single-addressing, .single-control store organization can 
provide only pseudo-independence, and that at the expense 
of combination. 

Consider a conceptual data flow for a CPU such as that 
shown in [1, Fig. 17] and repeated here as Fig. 22. It was 
composed of five boxes and the interconnections between 
them which need not concern us here. The main functions 
of the five boxes were as follows. 

1) Address store (functional memory): forms and incre- 
ments addresses for main store. 

2) Work store (functional memory) : acts as ALU and op 
decoder. 

3) Local store (functional memory) : holds programmers 
registers and provides work space. 

4) Main store (conventional). 

5) Control store (assume conventional). 

The data flow could be divided, for control purposes, into 
three functional areas which we will call 

A main store and addressing 

W ALU 

L . registers and data. 

In a conventional control store, each word would control 
all three functional areas for one cycle by having a control 
field devoted to each functional area and the appropriate 
values in each of the control fields, as in Fig. 23. Each field 
will have a set of values used : 
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It is likely that most of these values will appear many times 
in different combinations : 

'a x% w 7 ;l 5 

Since one control word is necessary for each combination, 
the total number of control words necessary in the control is 
related to the produce of the number of different values used 
in each field. This is seen as the field combination problem. 

If, on the other hand, the control fields were to be sep- 
arated from each other and each field given a,separate con- 
trol store, then the total number of control words necessary 
would be related not to the product of the number of differ- 
ent values used in each field, but to their sum. 

Let us extract a second principle. 

Principle 2: The system designer should attempt to iden- 
tify those areas of control which, though concurrent, are 
architecturally independent of each other and provide a 
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Fig. 23. Control fields. 

separate source of control for each independent area for the 
duration of their independence. 

There are several arrangements for providing indepen- 
dence. 

I) Each control store full width (Fig. 24): Each control 
store is capable of controlling the whole data flow, but dur- 
ing any one cycle may control only one of the functional 
areas, thus we have the following. 
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2) Each control store only field width (Fig. 25): Each con- 
trol store is capable of controlling only one functional area 
of the data flow in any one cycle. The functional area to be 
controlled is determined by a field code which is read out 
with the field control value. 

3) Control store per functional area (Fig. 26): Each con- 
trol store is dedicated to a particular functional area. As 
data pass out of the control of one functional area, so too 
does addressing information from the control store of that 
functional area to the control store of the functional area to 
which the data are being passed. Each control store is thus 
sequenced not only by its own next address field, but also by 
next address information from the other control stores. 

Of these three arrangements, the first and third seem of 
particular interest. The second arrangement would require 
an additional stage of decoding to decide which functional 
area is to be controlled. It is therefore not recommended. . 

An interesting difference/which can be illustrated in a 
pipeline organization, exists between the first and third ar- 
rangements. In the first, each control store could be devoted 
to a programmer instruction and could control it through 
the data flow from start to finish. (Clashes between instruc- 
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tions which cause holdoffs would presumably be detected in 
the data flow, and returned to the control stores as holdoffor 
branching information.) There would, therefore, be as many 
control stores as the maximum number of instructions in the 
pipeline at any one time. With arrangement 3, on the other 
hand, since each control store controls a particular func- 
tional area, responsibility for each instruction is passed 
from one control store to another (one or many) as the in- 
formation relative to that instruction passes around the 
data flow. 

Although the conventional two-state directly addressed 
stores could be used to implement these control store ar- 
rangements, functional memory is particularly well suited 
to them. First, in its use in the data flow, functional areas are 
easier to define and isolate with functional memory than 
with conventional logic. This is because the elements of a 
conventional data flow tend to be defined according to their 
logic specialization (ALU, shifter, staticizers, etc.). With 
functional memory, on the other hand, logic function can 
be more diffuse and can be assigned to functional areas ac- 
cording to architectural requirements. Thus, for example, 
functional area A (main store and addressing) can contain 
tables for address formation, incrementing, and address 
specification testing, as required, and also locations for 
temporary storage of addresses. Second, in its use for con- 
trol, arrangement 3 is well suited to the idea of integration 
of control into the data flow, as suggested in the discussion 
on the branching delay problem. However, it is in arrange- 
ment 1 that the implications of functional memory on 
microprogramming can be seen most clearly, even if not to 



best advantage (a mixture of arrangements 1 and 3 would 
probably be optimum). 

It may seem wasteful to have only one third of each con- 
trol store width active in any one cycle, but let us recall the 
discussion in Part I-D on the compression of tables. It was 
said there that probably the greatest degree of table com- 
pression results from the provision of independent logic sub- 
fields. What is being provided here by arrangement 1 is in- 
dependent control fields, with no decision having to be made 
concerning the placing of those control fields. 

It was suggested for arrangement 1 that each control store 
would control only one functional area in any one cycle. A 
natural extension is that in any one cycle any control store 
could control any number of functional areas from 0 to 3; 
thus .we have the following. 
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A further extension is to place all three control stores into 
one functional memory store, and read out up to three 
words at the same time. There could then be a variable num- 
ber of control stores, the number provided depending on the 
number of independent functional areas required from cycle 
to cycle. For instance, in the System/360 RX example, a 
single control sequence, with only a single word being read 
out each cycle, might precede the fetching of the operands. 
At that point the control could split into two independent 
sequences, two words being read out for each cycle, until 
they came together again as a single sequence for processing. 

In these examples the control fields being read out in a 
cycle are put together by concatenation. Yet a further ex- 
tension is possible if the control fields are put together in 
some other way, for example by ORing the fields or xoRing 
using the read right xor left. 

Conclusion 

Functional memory is basically an array. The circuits are 
therefore densely packaged and the interconnections be- 
tween them are regular. 

The array is associative and the cells are able to hold a 
don't care state in addition to the normal binary 0 and 1 . 
Functional memory is therefore able to perform logic by 
table lookup very much more cheaply than conventional 
two-state table lookup methods. Techniques like read 
right xor left provide even greater table compression. 

Logic personality is loadable and easily changed. Module 
manufacture is therefore independent of the function that 
it is to perform in a system. Because of this independence, 
because it is an array, and because it can perform logic, 
functional memory is an attractive way of applying LSI to 
the field of random logic. 

Functional memory words can be written during pro- 
cessing. It can therefore be used for local storage, working 
registers, status information, etc. 

A functional memory data flow consists of a collection of 
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functional memory modules connected together. Because 
the elements of the data flow are all operating synchro- 
nously, are all of the same type, and are distinguished from 
each other only by their position in the data flow and by 
their contents, the system designer has the following option : 
either to specialize the elements by logic function (as with a 
conventional data flow) or to diffuse logic function around 
the data flow and form "functional areas" that are related 
to architectural requirements. He will probably do both. 

The voltage levels of the data pins and external control 
pins of a functional memory module are compatible. There- 
fore '"auto-sequencing 11 is possible, whereby some func- 
tional memory modules control others or themselves. Again 
the system designer has the option of controlling his data 
flow by a separate control store (as with a conventional 
microprogram-controlled data flow), or integrating the 
control into the data flow and providing independent local 



control where it would be advantageous. He will probably 
do both. 

Thus functional memory not only seems a good solution 
to the problem of applying LSI to random logic, but also 
opens up new possibilities in the field of microprogram 
control. 
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A Microprogrammed Intelligent 
Graphics Terminal 
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Abstract — -This paper describes a small computer, Interdata Model 
3, that has been microprogrammed to serve as an intelligent terminal. 
The Interdata is connected to a System/360 multiplexor channel with 
a high-speed interface, and uses en ARDS direct view storage tube 
as a display console. The new Interdata target machine is patterned 
after the /360 (including all five instruction formats), but also has 
instructions particularly designed for intelligent terminal pro- 
gramming. These include instructions for character string manipula- 
tion, code conversion, list processing, coordinate manipulation, and 
virtual addressing. A powerful multiplexor channel, which allows the 
programmer to "overlap" I/O to several devices with a CPU program, 
has also been microprogrammed. 

A program to simulate an IBM 2250 display console has been 
written for the Interdata. Although a 2250 is not an intelligent termi- 
nal, the simulator allows existing graphics programs at Brown Uni- 
versity to be run using the Interdata-AR DS without any modifications. 

The intelligent terminal will be used to investigate the appropriate 
division of labor between a central processor and satellite computer. 
Divisional tradeoffs will be studied by allowing the user to transfer 
routines of an applications program between processors at run time 
as a function of loading and response time. 

Index Terms — Instruction set, intelligent graphics terminal, 
microprogramming, minicomputer, multiplexor channel. 
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I. Introduction 
A. Intelligent Terminals 

OVER the last five years one of the main research 
topics in computer science at Brown University has 
been computer graphics. An active research and 
development group has completed a number of interactive 
applications programs, including the Hypertext Editing 
System, 3D-PDP (a program to lay out and analyze indus- 
trial piping systems), and the Flowchart Programming 
System. In all cases a buffered IBM 2250/Model I CRT, di- 
rectly connected to a large System/360 computer, was used 
for interaction. 

Recently we have become interested in replacing the 
2250/Model l with a so-called "intelligent terminal." Such 
a terminal combines display facilities with the ability to do a 
significant amount of local processing. The utility of local 
processing is to relieve the main computer of vast amounts 
of I/O for low-level interaction and to provide the user with 
immediate response to modest computational or data 
manipulation requests. One intelligent terminal configura- 
tion which we have tested was a 2250/Model 4 connected to 
an IBM H30, which in turn was interfaced over a 40.8 
kilobaud line to the/360. This combination was on the one 



